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Abstract. We call a CNF formula linear if any two clauses have at most one variable in 
common. We show that there exist unsatisfiable linear fc-CNF formulas with at most 4fc 2 4 ft 
clauses, and on the other hand, any linear fc-CNF formula with at most &< ,-i k -i clauses is 
satisfiable. The upper bound uses probabilistic means, and we have no explicit construction 
coming even close to it. One reason for this is that unsatisfiable linear formulas exhibit a 
more complex structure than general (non-linear) formulas: First, any treelike resolution 

refutation of any unsatisfiable linear fc-CNF formula has size at least 2 2 2 . This implies 
that small unsatisfiable linear fc-CNF formulas are hard instances for Davis-Putnam style 
splitting algorithms. Second, if we require that the formula F have a strict resolution tree, 

i.e. every clause of F is used only once in the resolution tree, then we need at least a a 
clauses, where a ~ 2 and the height of this tower is roughly k. 



1. Introduction 

How can CNF formulas become unsatisfiable? Roughly speaking, there are two ways: 
Either some constraint (clause) is itself impossible to satisfy - the empty clause; or, every 
clause can be satisfied individually, but one cannot satisfy all of them simultaneously. In the 
latter case, the clauses have to somehow overlap. How much? For example, take k boolean 
variables x%, . . . , x^. The conjunction of all 2 k possible clauses of size k is the complete 
k- CNF formula and denote by /C^. It is unsatisfiable, and as small as possible: Any A;-CNF 
formula with less than 2 k clauses is satisfiable. Clearly, the clauses of ICk overlap a lot. 
What if we require that any two distinct clauses share at most one variable? We call such 
a formula linear. There are unsatisfiable linear /c-CNF formulas, but they are significantly 
larger and have a much more complex structure than 



A CNF formula is a conjunction (AND) of clauses, and a clause is a disjunction (OR) 
of literals. A literal is either a boolean variable x or its negation x. We require that a clause 
does not contain the same literal twice, and does not contain complementary literals, i.e., 
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both x and x. To simplify notation, we also regard formulas as sets of clauses and clauses 
as sets of literals. A clause with k literals is a fc-clause, and a fc-CNF formula is a CNF 
formula consisting of fc-clauses. For a clause C, we denote by vbl(C) set of variables x with 
x G C or x G C. Consequently, a CNF formula F is linear if |vbl(C) n vbl(D)| < 1 for any 
two distinct clauses C,D G F. As a relaxation of this notion, we call F weakly linear if 
\C n D\ < 1 for any distinct C,D G F. 

Example. The formula (xi V x%) A (»2 V X3) A (2:3 V 24) A (#4 V xi) is linear, whereas 
(xi V X2) A 0^1 V X2) A (a?2 V £3) is weakly linear, but not linear, and finally [x% VX2 V X3) A 
{x\ V12 V X3) is not weakly linear (and not linear, either). 

It is not very difficult to construct an unsatisfiable linear 2-CNF formula, but signifi- 
cantly more effort is needed for a 3-CNF formula. It is not obvious whether unsatisfiable 
linear fc-CNF formulas exist for every k. These questions have been asked first by Porschen, 
Speckenmeyer and Randerath [15] . who also proved that for any k > 3, if an unsatisfiable 
linear /c-CNF formula exists, then deciding satisfiability of linear fc-CNF formulas is NP- 
complete. Later, Porschen, Speckenmeyer and Zhao [16] and, independently, myself |18] 
gave a construction of unsatisfiable linear /c-CNF formulas, for every k G No: 

Theorem 1.1 (|16j. |18j). For every k > 0, there exists an unsatisfiable linear k-CNF 
formula Fk, with Fq containing one clause and Fk+i containing \Fk\2^ Fk ^ clauses. 

The \Fk\ are extremely large. Here, we will give an almost optimal construction. 

Theorem 1.2. All weakly linear k-CNF formulas with at most 8e -2^_iy2 clauses are satis- 
fiable. There exists an unsatisfiable linear k-CNF formula with 4k 2 A k clauses. 

It is a common phenomenon in extremal combinatorics that by probabilistic means one 
can show that a certain object exists (in our case, a "small" linear unsatisfiable /c-CNF 
formula), but one cannot explicitly construct it. We have no explicit construction avoiding 
the tower-like growth in Theorem 11.11 We give some arguments why this is so, and show 
that small linear unsatisfiable A>CNF formulas have a more complex structure than their 
non-linear relatives. To do so, we speak about resolution. 

1.1. Resolution Trees 

If C and D are clauses and there is unique literal u such that u G C and u G D, then 
(C \ {u}) U (D \ {u}) is called the resolvent of C and D. It is easy to check that every 
assignment satisfying C and D also satisfies the resolvent. 

Definition 1.3. A resolution tree for a CNF formula F is a tree T whose vertices are 
labeled with clauses, such that 

• each leaf of T is labeled with a clause of F, 

• the root of T is labeled with the empty clause, 

• if vertex a has children b and c, and these are labeled with clauses C a ,Cb,C c , 
respectively, then C a is the resolvent of C& and C c . 

It is well-known that a CNF formula F is unsatisfiable if and only if it has a resolution 
tree (which can be exponentially large in \F\). Proving lower bounds on the size of resolution 
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trees (and general resolution proofs, which we will not introduce here) has been and still is 
an area of intensive research. See for example Ben-Sasson and Wigderson [2]. 

Theorem 1.4. Let k > 2. Every resolution tree of an unsatisfiable weakly linear k-CNF 

k _j 

formula has at least 2 2 ^ leaves. 

A large ratio between the size of F and the size of a smallest resolution tree is an indi- 
cation that F has a complex structure. For example, it is well-known that the running time 
of so-called Davis-Putnam procedures on a formula F is lower bounded by the size of the 
smallest resolution tree of F (actually those procedures were introduces by Davis, Logeman 
and Loveland [3j). Such a procedure tries to find a satisfying assignment for a formula F 
(or to prove that none exists) by choosing a variable x, and then recursing on the formulas 
j?[ x ^>-o] an( j j?[x>->i]^ obtained from F by fixing the value of x to or 1, respectively. If F is 
unsatisfiable, the procedure implicitly constructs a resolution tree. 

A CNF formula F is minimal unsatisfiable if it is unsatisfiable, and for every clause 
C 6 F, F \{C} is satisfiable. The complete A:-CNF formula introduced above is minimal 
unsatisfiable, and has a resolution tree with 2 k leaves, one for every clause. This is as small 
as possible, since for a minimal unsatisfiable formula, every clause must appear as label 
of at least one leaf of any resolution tree. We call a resolution tree strict if no two leaves 
are labeled by the same clause, and a formula F strictly treelike if it has a strict resolution 
tree. In some sense, strictly treelike formulas are the least complex formulas possible. For 
example, the complete formula fCk and the formulas constructed in the proof of Theorem ll.il 
are strictly treelike. 

Theorem 1.5. For any e > 0, there exists a constant c such that for any k £ N, any strictly 
treelike weakly linear k-CNF formula has at least tower2- e (A; — c) clauses, where tower a (n) 
is defined by tower a (0) = 1 and tower a (n + 1) = a tower <>( n ) . 

Strictly treelike formulas appear in other contexts, too. Consider MU(1), the class 
of minimal unsatisfiable formulas whose number of variables is one less than the number 
of clauses. A result of Davydov, Davydova and Kleine Biining ([5], Theorem 12) implies 
that every MU(l)-formula is strictly treelike. Also, MU(l)-formulas serve as "universal 
patterns" for unsatisfiable formulas: Szeider [19] shows that a formula F is unsatisfiable if 
and only if it can be obtained from a MU(l)-formula G by renaming the variables of G (in 
a possibly non-injective manner). It is not difficult to show that a strictly treelike linear 
fc-CNF formula can be transformed into a linear MU(l)-formula with the same number of 
clauses. 

1.2. Related Work 

For a CNF formula F and a variable x, let dp(x) denote the degree of x, i.e. the 
number of clauses of F containing x or x, and let d(F) := max^ dp (a;) denote the maximum 
degree of F. For the complete A:-CNF formula /Cfc, we have d{JCk) = 2 fc . Intuitively, in an 
unsatisfiable /c-CNF formula, some variables should occur in many clauses. In other words, 
the following function should be large: 

f(k) := max{<i | every k-CNF formula F with d(F) < d is satisfiable} . (1-1) 
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The function f{k) has first been investigated by Tovey [20], who showed f(k) > k, using 
Hall's Theorem. Using the famous Lovasz Local Lemma (see [5] for the original proof, or pQ 
for several generalized versions), Kratochvil, Savicky and Tuza [12] proved that f(k) > 
and that while all /c-CNF formulas F with d(F) < f(k) are trivially satisfiable, deciding 
satisfiability of /c-CNF formulas F with d(F) < f(k) + 1 is already NP-complete, for k > 3. 
For k = 3, this is already observed in [20J. For an upper bound, the complete /c-CNF 
formula witnesses that f(k) < 2 k - 1. Savicky and Sgall [17J showed f(k) e 0(/c~ a26 2 fc ). 

This was improved by Hoory and Szeider [9] to f(k) G O ^ ln ^ 2 ) , and recently Gebauer [7] 

proved that /(/c) < ^x— ■ Thus, /(A;) is known up to a constant factor. The best upper 
bounds on f(k) come from MU(l)-formulas. This is true for large values of k, since the 
formulas constructed in [7] are MU(1), as for small values: Hoory and Szeider [8j show that 
the function f(k), when restricted to MU(l)-formulas, is computable (in general this is not 
known), and derive the currently best-known bounds on f(k) for small k (k < 9). To sum- 
marize: When we try to find unsatisfiable /c-CNF formulas minimizing a certain parameter, 
like number of clauses or maximum degree, strictly treelike formulas do an excellent job. 
However, if we try to construct a small unsatisfiable linear /c-CNF formula, they perform 
horribly. Just compare our upper bound in Theorem 11.21 with the lower bound for strictly 
treelike formulas in Theorem 11.51 

While interest in linear CNF formulas is rather young, linear hypergraphs have been 
studied for quite some time. A hypergraph H = (V,E) is linear if |e D /| < 1 for any 
two distinct hyperedges e, / £ E. A /c-uniform hypergraph is a hypergraph where every 
hyperedge has cardinality k. We ask when a hypergraph 2-colorable, i.e., admits a 2-coloring 
of its vertices such that no hyperedge becomes monochromatic. Bounds on the number of 
edges in such a hypergraph were given by Erdos and Lovasz [5] (interestingly, this is the 
paper where the Local Lemma has been proven). They show that there are non- 2-colorable 
linear /c-uniform hypergraphs with c/c 4 4 fc hyperedges, but not with less than The proof 
of the lower bound directly translates into our lower bound for linear /c-CNF formulas. For 
the number of edges in linear /c-uniform hypergraphs that are not 2-colorable, the currently 
best upper bound is ck 2 4 k by Kostochka and Rodl [TT], and the best lower bound is k~ e 4 k , 
for any e > and sufficiently large k, due to Kostochka and Kumbhat [TU] . 

2. Existence and Upper and Lower Bounds 

Proof of Theorem Choose Fq to be the formula consisting of only the empty clause. 
Suppose we have constructed Fk, and want to construct Fk+i- Let m = \Fk\- We create m 
new variables xi, . . . , x m7 and let JC m = {D\,D2, . . . , I?2 m } be the complete m-CNF formula 
over xi, . . . ,x m . It is unsatisfiable, but not linear. We take 2 m variable disjoint copies of 
F k , denoted by F & (1) , F fc (2) , . . . , F^ . For each 1 < i < 2 m , we build a linear (k + 1)-CNF 
formula iv^ from iv by adding, for each 1 < j < m, the j th literal of Di to the j th 
clause of Ft ' . Note that every assignment satisfying Ft also satisfies Di . Finally, we set 
-ffe+i := Ui=i ^fc • This is an unsatisfiable linear (k + 1)-CNF formula with m2 m clauses. ■ 

Using induction, it is not difficult to see that the formulas F^ are strictly treelike. 
We will prove the upper bound in Theorem 11.21 by giving a probabilistic construction of 
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a comparably small unsatisfiable linear A:-CNF formula. Our construction consists of two 
steps. First, we construct a linear A;-uniform hypergraph H that is "dense" in the sense 
that — is large, where m and n are the number of hyperedges and vertices, respectively, 
and then transform it randomly into a linear /c-CNF formula F that is unsatisfiable with 
high probability. 

Lemma 2.1. If there is a linear k-uniform hypergraph H with n vertices and m edges such 
that ^>2 k , then there is an unsatisfiable linear k- CNF formula with m clauses. 

Proof. Let H = (V,E). By viewing V as a set of variables and £ as a set of clauses 
(each containing only positive literals), this is a (satisfiable) linear fc-CNF formula. We 
replace each literal in each clause by its complement with probability i, independently in 
each clause. Let F denote the resulting (random) formula. For any fixed truth assignment 
a, it holds that Pr[a satisfies F] = (1 — 2~ k ) m . Hence the expected number of satisfying 
assignments of F is 

c^—kyn ^ c^a ^—2~ k m gln(2)n— 2~~ k m ^ 

where the last inequality follows from — > 2 k . Hence some formula F has fewer than one 
satisfying assignment, i.e., none. ■ 

How can we construct a dense linear hypergraph? We use a construction by Kuzjurin |13j . 
Our application of this construction is motivated by Kostochka and Rodl who use it 
to construct linear hypergraphs of large chromatic number. 

Lemma 2.2. For any prime power q and any k € N ; there exists a k-uniform linear 
hypergraph with kq vertices and q 2 edges. 

With n = kq, this hypergraph has n 2 /k 2 hyperedges. This is almost optimal, since 
any linear fc-uniform hypergraph on n vertices has at most Q) / (2) hyperedges: The n 
vertices provide us with (™) vertex pairs. Each hyperedge occupies ({$) pairs, and because 
of linearity, no pair can be occupied by more than one hyperedge. 

Proof. Choose the vertex set V = V\ W • • • ttJ Vjt, where each Vi is a disjoint copy of the finite 
field GF(q). The hyperedges consist of all fe-tuples (xi, 
such that 

/ll 1 1 \ ( 

1 2 ... i ... k 

14 i 2 k 2 

V 1 2 k ~ 3 ... i k ~ 3 ... k k ~ 3 I 

V 

Consider two distinct vertices x £ Vi, y G Vj. How many hyperedges contain both of 
them? If i = j, none. If i 7^ j, we can find out by plugging the fixed values x, y into (|2.1j) . 
We obtain a (possibly non-uniform) (k — 2) x (k — 2) linear system with a Vandermonde 
matrix, which has a unique solution. In other words, x and y are in exactly one hyperedge, 
and the hypergraph is linear. By the same argument, there are exactly q 2 hyperedges. ■ 



, Xk) with Xi G Vi, 1 < i < k, 



xi \ 

X2 



.1, j 



. 



(2.1) 



Xk J 
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Proof of the upper bound in Theorem \1.0A Choose a prime power q G {k2 k , . . . , 2k2 k — 1}. 
By Lemma 12.21 there is a linear fc-uniform hypergraph H with n = qk vertices and m = q 2 
hyperedges. Since ^ = | > 2 k , Lemma 12.11 shows that there is an unsatisfiable linear 
fc-CNF formula with q 2 < Ak 2 4 k clauses. ■ 

Let us prove the lower bound of Theorem 11.21 For a literal u and a CNF formula 
F, we write occf(u) '■= \{C G F \ u G C}\, the degree of the literal u. Thus (i_p(x) = 
occp(x) + occp(x). We write occ(F) = max u occp(u). In analogy to f(k), we define f OC c(k) 
to be the largest integer d such that any fc-CNF formula F with occ(-F) < d is satisfiable. 
Clearly f OC c(k) > ^p-, and thus from [12] it follows that f OC c(k) > ^r. Actually, an 
application of the Lopsided Lovasz Local Lemma [S1[TJ[I3] yields f cc(k) > \ — 1. 

Lemma 2.3. Let F be a linear k- CNF formula with at most 1 + f OC c(k — 1) variables of 
degree at least 1 + f occ {k — 1). Then F is satisfiable. 

Proof. Transform F into a (k — 1)-CNF formula F' by removing in every clause in F a literal 
of maximum degree. We claim that deg F i(u) < f cc(k — 1) for every literal u. Therefore F' 
is satisfiable, and F is, as well. 

For the sake of contradiction, suppose there is a literal u such that t := occp>(u) > 
1 + focc(k — 1). Let i = 1,2, ... ,t, be the clauses in F' containing u. CI is obtained 
by removing some literal Vi from some clause Ci G F. By construction of F', ocCi?(fj) > 
occf(u) > f occ (k — 1) + 1 for all 1 < i < t. The Vi are pairwise distinct: If Vi = Vj, then 
{n, f i} C Q (1 Cj. Since i 7 is weakly linear, this can only mean i = j. Now u,vi,V2, ■ ■ ■ , vt 
are t + 1 > 2 + f OC c{k — 1) variables of degree at least 1 + f OC c(k — 1) in F, a contradiction. ■ 

We see that an unsatisfiable weakly linear /c-CNF formula has at least f occ {k — 1) + 2 > 
2e (fc_i) + 1 literals of degree at least f occ {k — 1) + 1 > 2e (k-i) ' Double counting yields 

^1^1 = ^2 u occ f( u ) ^ 4 e a(t-i)^ > th us l-^l > wSk s ' a more careful argument, we can 
improve this by a factor of k. We call a hypergraph (j,d)-rich if at least j vertices have 
degree at least d. The following lemma is due to Welzl |22j . 

Lemma 2.4. For d G No, every linear (d,d)-rich hypergraph has at least ( d ~^ 1 ) edges. This 
bound is tight for all d G No- 

Proof. We proceed by induction over d. Clearly, the assertion of the lemma is true for d = 0. 
Now let H = (V,E) be a linear (d, d)-rich hypergraph for d > 1. Choose some vertex ?; of 
degree at least din H and let i?' = (V, -E") be the hypergraph with E' := E\{e £ E\e 3 v}. 
We have (i) \E\ > \E'\ + d, (ii) -ff 7 is linear, since this property is inherited when edges are 
removed, and (iii) H' is (d — l,d — l)-rich, since for no vertex other than v the degree 



fd+l\ 
2 



decreases by more than 1 due to the linearity of H. It follows hat \E\ > ( 2 ) + d 
The complete 2-uniform hypergraph (graph, so to say) on d + 1 vertices shows that the 
bound given is tight for all d G No- ■ 

Proof of the lower bound in Theorem \1.SX A weakly linear /c-CNF formula F is a linear k- 
uniform hypergraph, with literals as vertices. If F is unsatisfiable, then by Lemma 12.31 it is 
(/occ(fc - 1) + 1, focc(k - 1) + l)-rich. By LemmaE31 F has at least (/«*(*-i)+2j > ^_ iyi 
clauses. ■ 
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Figure 1: A resolution tree, with its edges labeled in the obvious way. Every clause is 
unsatisfied when applying the assignments on the path to the root. 



There is an obvious generalization of the notion of being linear. We say a CNF formula 
is b-linear, if any two distinct clauses C,D G F fulfill |vbl(C) n vbl(D)| < 6, and weakly 
b-linear if |C D D\ < b holds for all distinct C,D £ F. Thus, a (weakly) 1-linear formula 
is (weakly) linear. We can generalize Theorem 11.21 for b > 2. However, the proofs do not 
introduce new ideas and goes along the lines of the proofs presented above. 

2 fe(i+i) 



Theorem 2.5. Let b > 2. Every weakly b-linear k- CNF formula with at most 



2 b+2 e 2 k 2+T 



clauses is satisfiable. There exists an unsatisfiable b-linear k-CNF formula with at most 



(k2 h ) 1+ b clauses. 



3. Proof of Theorem 11.41 

Let F be an unsatisfiable weakly linear fc-CNF formula, and let T be a resolution tree 
of minimal size of F. We want to show that T has a large number of nodes. It is not 
difficult to see that a resolution tree of minimal size is regular, meaning that no variable is 
resolved more than once on a path from a leaf to the root. See Urquhart [21], Lemma 5.1, 
for a proof of this fact. We take a random walk of length I in T starting at the root, in 
every step choosing randomly to go to one of the two children of the current node. If we 
arrive at a leaf, we stay there. We claim that if i < V2 k ~ 2 , then with probability at least 
^, our walk does not end at a leaf. Thus, T has at least 2 e ~ 1 inner vertices at distance i 

from the root, thus at least 2 2 ^ leaves. 

As illustrated in Figure 1, we label each edge in T with an assignment. If C is the 
resolvent of D\ and x G D\ and x £ D 2 , we label the edge from C to D\ by x i->- 
and from C to D 2 by x t— >■ 1. Each path from the root to a node gives a partial assignment 
a. If that node is labeled with clause C, then C evaluates to false under a. In our 
random walk, let cti denote the partial assignment associated with the first i steps, ao is 
the empty assignment, and assigns exactly i variables (if we are not yet at a leaf). We 
set Fi := F^- ai \ i.e., the formula obtained from F by fixing the variables according to the 
partial assignment ol{. For a formula G, we define the weight w(G) to be 

W (G) := Yl 2fc ~' q • t 3 - 1 ) 

CGG,|C*|<fc-2 
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Since F is a fc-CNF formula, w(F) = 0. If some formula G contains the empty clause, then 
w(G) > 2 k . In our random walk, w(Fi) is a random variable. 

Lemma 3.1. E[w(F i+1 )} < E[w(Fi)]+4i. 

bmce w (F ) = 0, this implies M[w(F e )] < 4© < 2£ 2 . If our random walk ends at a 
leaf, then Fi contains the empty clause, thus w(Fi) > 2 k . Therefore 2£ 2 > E[tt)(i 7 £)] > 
2 k Pr[the random walk ends at a leaf]. We conclude that at least half of all paths of length 
£* = \/2 fe ~ 2 starting at the root do not end at a leaf. Thus T has at least 2^* _1 internal 
nodes at distance £* from the root, and thus at least 2^ leaves, which proves the theorem. 
It remains to prove the lemma. 

Proof of the lemma. For a formula G and a variable x, let dk-i(x,G) denote the number 
of (k — l)-clauses containing x or x. Since F is a fc-CNF formula, dk-x(x, F ) = 0, for all 
variables x. We claim that dk-i(x, -Fj+i) < dk-i{x,F,{) + 2 for every variable x. To see this, 
note that in step i, some variable y is set to b € {0, 1}, say to 0. At most one fc-clause 
of F{ contains y and x, and at most one contains y and x, since F% is weakly linear, thus 
dk-i(x, Fi + i) < dk-i(x,Fi) + 2. It follows immediately that dk-i(x,Fi) < 2i. 

Consider w(Fi), which was in ()3.1|) . -Fj+i is obtained from Fi by setting some variable 
y randomly to or 1. Consider a clause C. How does its contribution to (I3.ip change 
when setting y? If (i) y vbl(C) or \C\ = k, it does not change. If (ii) y £ vbl(C) and 
|C| < k—2, then with probability | each, its contribution to (|3.ip doubles or vanishes. Hence 
on expectation, it does not change. If (iii) y G vbl(C) and |C| = k — 1, then C contributes 
nothing to w(Fi), and with probability |, it contributes 4 to w(Fi + i). On expectation, its 
contribution to (|3.ip increases by 2. Case (iii) applies to at most dk-i(y,Fi) < 2i clauses. 
Hence E[w(F i+1 )] < R[w(F$\ + ii. m 

4. Proof of Theorem 11.51 

Let F be a strictly treelike weakly linear A:-CNF formula F, and let T be a strict 
resolution tree of F. Letters a, b, c denote nodes of T, and u, v, w denote literals. Every 
node a of T is labeled with a clause C a . We define a graph G a with vertex set C a , connecting 
u, v £ C a if u, v G D for some clause D £ F that occurs as a label of a leaf in the subtree of 
a. Since T is a strict resolution tree and F is weakly linear, every edge in G a comes from a 
unique leaf of T. Resolution now has a simple interpretation as a "calculus on graphs", see 
Figure 2. If a is a leaf, then G a = K^. Since the root of a resolution tree is labeled with the 
empty clause, we have GVoot = For a graph G, let Ki(G) denote the minimum size 

of a set U C V(G) such that G — U contains no i-clique. Here, G — U is the subgraph of G 
induced by V(G) \ U. Thus, Ki(G) = \V(G)\, and k%(G) is the size of a minimum vertex 
cover of G. For the complete graph K^, Ki(Kf.) = k — i + 1. We write Ki(a) := Ki(G a ). The 
tuple («i(a), . . . , Kfc(a)) can be viewed as the complexity measure for a. We observe that if 
a is a leaf, then «j(a) = fc — £ + 1, and Kj(root) = 0, for all 1 < z < A;. If a is an ancestor 
of b in T, let dist(a, b) denote the number of edges in the T-path from a to b. Since one 
resolution step deletes one literal (and may add several), the next proposition is immediate: 

Proposition 4.1. If b is a descendant of a in T, then Ki(b) < Ki(a) + dist(a, b). 



UNSATISFIABLE LINEAR CNF FORMULAS ARE LARGE AND COMPLEX 



629 





u 
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v — if 






Figure 2: Resolution as a calculus on graphs. A resolution step amounts to deleting the 
resolved vertex and taking the union of the two graphs. 



At this point we want to give an intuition of the proofs that follow. Our goal is to show 
that if the values Ki(a) are small for some node a in the tree, then the subtree of a is big. The 
proof goes roughly as follows: If the subtree of a is small, then there are many descendants 
b of a that are not too far from a and have even smaller subtrees. By induction, we will be 
able to show that fCj.fi (6) is fairly large. Thus, on the path from b to a, not all (i + l)-cliques 
are destroyed, and every such descendant b of a provides G a with an (i + l)-clique. These 
cliques need not be vertex-disjoint, but they are edge-disjoint. This implies that G a has 
many vertex-disjoint i-cliques, a contradiction to Ki(a) being small. To make this intuition 
precise, we have to define what small and big actually means in this context: We fix a value 
1 < £ < k and define and for 1 < i < £ as follows: := [_ fc ~2 +1 J — ^ anc ^ v l := ^> anc ^ 
for 1 < i < £, we inductively define B{ : 



2"i+i«;+i- 



1 and v; 



One 



should not worry about these ugly expressions too much, they are only chosen that way to 
make the induction go through. For the right value of £, one checks that 6\ is a tower 
function in k. More precisely, for any e > 0, there exists a c G N such that when choosing 
I = k — c, then 6\ > towei2-e(k — c). The following theorem is a more precise version of 
Theorem 11.51 

Theorem 4.2. Let F be a strictly treelike linear k-CNF formula. Then F has at least 2 Uldl 
clauses. 

Proof. A node a in T is i-extendable if Kj(a) < 6j for each i < j < £. We observe that if a 
is i-extendable, it is also (i + l)-extendable. For i = £ + 1, the condition is void, so every 
node is {£ + l)-extendable. Also, the root is 1-extendable, since Ki(root) = 0. 

Definition 4.3. A set A of descendants of a in T such that (i) no vertex in A is an ancestor 
of any other vertex in A and (ii) dist(a, b) < d for all b G A is called an antichain of a at 
distance at most d. If furthermore every b G A is i-extendable, we call A an i-extendable 
antichain. 

Lemma 4.4. Let 1 < i < £, and let a be a node in T. If a is i-extendable, then there is an 
(i + l)-extendable antichain A of a at distance at most 9i such that \A\ = 2 Ui8i . 

Proof. We use induction on £ — i. For the base case i = £, we have Kg(a) < 0£, as a is 
^-extendable. Since each leaf b of T has Ke(b) = k — £ + 1 > 29g + 2, Proposition 14.11 tells us 
that every leaf in the subtree of a has distance at least On + 2 from a. Since T is a complete 
binary tree, there are 2° l descendants of a at distance exactly 0£ from a. This is the desired 
antichain A of a. Since every node is (£ + l)-extendable, the base case holds. For the step, 
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Figure 3: Illustration of the claim in the proof of Lemma 14.41 If node a is i-extendable, 
and b is a close (i + l)-extendable descendant of a, then b itself has many close 
descendants A, at least half of which are (i + l)-extendable themselves. 



let a be i-extendable, for 1 < i < I. 



Claim: Let b be a descendant of a with dist(a, b) < Q{. If b is (i + l)-extendable, then 

i+l 



there is an (i + l)-extendable antichain A of 6 at distance at most of size 2 Vi + 1 ® i + 1 1 



Proof of the claim. By applying the induction hypothesis of the lemma to b, there is an 
(i + 2)-extendable antichain A of b at distance at most of size 2 l, *+ ie< + 1 . Let Agood := 
{c G j4 I Kj + i(c) < This is an (i + l)-extendable antichain. If ^4 g ood contains at least 

half of A, we are done. See Figure 3 for an illustration. Write ^4bad := A\A goo d and suppose 
for the sake of contradiction that ^4bad > 2 Ui+1 Consider any c G ^4bad- On the path 

from c to b, in each step some literal gets removed (and others may be added). Let P denote 
the set of the removed literals. Then C C \{P} C C&, and G c — P is a subgraph of Gb- Node 
c is not (i + l)-extendable, thus Kj + i(e) > 9i + \ + 1. Since \P\ = dist(6, c) < 9i+\, the graph 
G c — P contains at least one (i + l)-clique, which is also contained in Gb- This holds for every 
c G ^4bad ; and by weak linearity, Gb contains at least |^4bad| edge disjoint (i + l)-cliques. 
Since b is (i + l)-extendable, there exists a set U C V(G U ), |C/| = K{+i(b) such that Gb — U 
contains no (i + l)-clique. Each of the |^4bad| edge-disjoint (i + l)-cliques Gb contains some 

vertex of U, thus some vertex v € U is contained in at least ^tft^ > 2 I+ a 1+1 — > 29 ',■ + 1 

|f | — — 

edge-disjoint (i + l)-cliques. Two such cliques overlap in no vertex besides v, hence Gb 
contains at least 29i + 1 vertex- disjoint i-cliques, thus Ki(b) > 29i + 1. By Proposition 14. 1\ 
Ki{a) > Kj(6) — dist(a, 6) > 0j + 1. This contradicts the assumption of Lemma 14.41 that a is 
z-extendable. We conclude that |^4bad| < which proves the claim. ■ 

Let us continue with the proof of the lemma. If A is an (i + l)-extendable antichain 
of a at distance d < 9i, then by the claim for each vertex b G A there exists an (i + I-- 
extendable antichain of b at distance at most 9%+i, of size 2" i+1 Their union is an 
(i + l)-extendable antichain A' of a at distance at most d + of size |^4|2^+ l6,i+1_1 . 
Hence we can "inflate" A to A' , as long as d < 0j. Starting with the (i + l)-extendable 

antichain {a} and inflate it times, and obtain a final (i + l)-extendable antichain of 

I — 

a at distance at most 9{ of size at least (2^+ ie ' l + 1 ~ 1 ) 
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Applying Lemma 14.41 to the root of T, which is 1-extendable, we obtain an antichain A 
of size nodes. Since T has at least \A\ leaves, this proves the theorem. ■ 

5. Open Problems 

Let /lin(&) be the largest integer d such that any linear fc-CNF formula F with d(F) < d 
is satisflable. Clearly /lin(^) > /(&)> an d from the proof of the upper bound in Theorem ll.2l 
it follows that /lin(&) < 2k2 k . Is there a significant gap between f(k) and /lin(&)? It is 
not difficult to show that /(2) = /lin(2) = 2, but we do not know the value of /lin(&) for 
any k > 3. How do unsatisfiable linear /c-CNF formulas look like? Can one find an explicit 
construction of an unsatisfiable linear A:-CNF formula whose size is singly exponential in k? 
We suspect one has to come up with some algebraic construction. What is the resolution 
complexity of linear A:-CNF formulas? Tree resolution complexity is doubly exponential in 
k. We suspect the same to be true for general resolution. 
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Appendix A. Proof of Theorem 12.51 

To prove the lower bound, we need an analog of Lemma 

Lemma A.l. Let F be a weakly b-linear k- CNF formula. Denote by I the number of literals 
occurring in at least f cc{k — b) + 1 clauses. If ( 7 1 ) < f occ (k — b), then F is satisfiable. 

Proof. Transform F into a (k — 6)-CNF formula F' by removing in every clause in F some b 
literals of maximum degree, breaking ties arbitrarily. We claim that occult) < f occ (k — b) 
for all literals u. Since F' is a {k — 6)-CNF formula, this means that F' is satisfiable, thus 
F is, too. 

Suppose u is a literal with t := occp'(u) > 1 + f OC c{k — b); this implies occ^(u) > 



(0 



.(0 



1 + focc(k — b). Let C[, i = 1, 2, . . . ,t, be the clauses in F 1 with C[ 3 u. Let v\ 
be the b literals that were removed so that C- was obtained. All these literals fulfill 
occp(vj ) > occf{u) > 1 + focc(k — b). Observe that because F is weakly 6-linear, the sets 



.(0 



}, 1 < i < t are t distinct sets of cardinality b not containing u. Therefore 



t < ( e t, 1 ) < focc(k — b). This is a contradiction since we assumed t > 1 + f OC c{k — b). m 

If F is an unsatisfiable weakly 6-hnear fc-CNF formula, then by Lemma [A. H the number 
£ of literals u with occf(u) > f occ {k — b) + 1 fulfills > focc(k — b) + 1. Thus, 



2e/c 



< /occ(A: - 6) + 1 < 



£- 1 
6 



< 



(1-1) 
6! 



and, since bl > 2 b 1 for b > 2, we obtain £ > y ^ + 1. i 7 contains at most £ many literals 
of degree at least f OC c(k — b) + 1, therefore 

2 fc(i+i) 



IF > 



2 6+2 e2A .2+I 



This is the lower bound of Theorem 12.51 



For an upper bound, we construct a 6-linear hypergraph H with n vertices and m 
hyperedges such that — > 2 k . Lemma 12.11 is easily seen extend to 6-linear formulas and 
hypergraphs and yield an unsatisfiable formula with m clauses. We need a generalization 
of Lemma 12.21 

, k} , there exists a 



Lemma A. 2. For any prime power q, any k £ N, and b £ {1, 
k-uniform b-linear hypergraph with kq vertices and q 1+b edges. 



Proof. Choose the vertex set V = V\ W • • • W V^, where each Vi is a disjoint copy of the finite 



field GF{q). The hyperedges consist of all fc-tuples [x%, 
such that 



/ 1 
1 
1 



\1 2 



1 

2 
4 

k-b-2 



1 



ik-b-2 



1 

k 
k 2 

„k-b-2 



\ 



,Xk) with Xi £ Vi, 1 < i < k, 



( *x \ 

X2 



J ... 



. 



(A.l) 
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The hyperedges form a 6-linear hypergraph. To see this, consider 6 + 1 distinct vertices 
%o G Vi ,xx G V^, . . . ,xt G Vi b . How many hyperedges contain all of these 6 + 1 vertices? 
If the indices io,. . . ,% are not distinct, none does. Otherwise, we plug in the fixed values 
xo, . . . , Xb into (jA.ip . We obtain a (possibly non-uniform) (A;— 6—1) x (k—b—1) linear system 
with a Vandermonde matrix, which has a unique solution. Hence those 6 + 1 vertices are in 
exactly one common hyperedge. By the same argument, there are exactly q 1+b hyperedges. 

■ 

We choose a prime power q such that \/ k2 k < q < 2\Jk2 k . By Lemma lA. 21 there is a 
6- linear hypergraph with n = kq vertices and m = q 1+b hyperedges. Then ^ = ^- > 2 fc , and 

by Lemma [2. H there is an unsatisfiable 6-linear hypergraph with at most m < 2 b+1 (k2 k ) 1+ b . 
This proves Theorem 12.51 
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